Tag

#human-level reasoning

1 article

ARC-AGI-3 offers $2M to any AI that matches untrained humans, yet every frontier model scores below 1%

The ARC-AGI-3 benchmark challenges AI systems to match untrained human performance in interactive environments, with no frontier model achieving more than 1% success. The test strips away AI's typical advantages, exposing a gap in reasoning and adaptability.

Mar 2690